Stochastic Recovery Of Sparse Signals From 
Random Measurements 
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Abstract — Sparse signal recovery from a small num- 
ber of random measurements is a well known NP-hard 
to solve combinatorial optimization problem, with im- 
portant applications in signal and image processing. 
The standard approach to the sparse signal recov- 
ery problem is based on the basis pursuit method. 
This approach requires the solution of a large con- 
vex optimization problem, and therefore suffers from 
high computational complexity. Here, we discuss a 
stochastic optimization method, as a low-complexity 
alternative to the basis pursuit approach. 

Keywords: sparse signal processing, random measure- 
ments, threshold accepting method 

1 Introduction 

There has been an increasing interest in the problem of 
sparse signal recovery from random measurements, which 
is strongly motivated by the recently introduced frame- 
work of Compressive Sensing (CS) (see [J- [7] and the ref- 
erences within). The main idea of CS is that if the signal 
is compressible, then a small number of random measure- 
ments contain sufficient information for its approximate 
or exact recovery. CS has promising applications in sig- 
nal and image processing, and it could potentially lead 
to interesting models for various interactions performed 
at the biological level jS] . 

The problem of sparse signal recovery from random pro- 
jections requires the minimization of the £q norm of the 
candidate solution. This is generally impossible to solve, 
since it requires an intractable combinatorial search. A 
common alternative is to consider the convex problem, 
known as Basis Pursuit (BP), which requires the mini- 
mization of the £± norm, as a sparsity-promoting func- 
tional. Here, we discuss a Stochastic Optimization (SO) 
method, which provides a low-complexity alternative to 
the standard Basis Pursuit (BP) approach used in CS 
PQ-J3- The considered SO method has the advantage 
of a very easy implementation, comparing to the highly 
complex BP approach. The objective function of the SO 
method is also different, and it is defined as the product 
between the t\ norm and the spectral entropy of the can- 
didate solution. This definition of the objective function 
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improves the signal reconstruction, since it is equivalent 
with a weighted l\ norm functional, where the weights 
correspond to the self-information of each component of 
the signal. Thus, by using such an objective function, 
the SO minimization method focuses on entries where the 
weights are small, and which by definition correspond to 
the non-zero components of the sparse signal. 

2 Compressive Sensing Framework 

Let us give a short description of the CS framework [k- 
[7j. Assume that we acquire a discrete signal: 



[zi, ...,z M ] T G 
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is an orthonormal basis of vectors spanning R M (here 
T stands for transposition of vectors and matrices). We 
denote by: 

*=^W|..# (M) ], (3) 

the matrix with the columns given by the basis vectors. 
Obviously, the matrix \1/ corresponds to a unitary trans- 
formation, i.e. it satisfies the orthogonality condition: 



vj/ T ip = M T =I M , 



(-1) 



where Im is the M x M identity matrix. We say that \E r 
provides a sparse representation of z, if z is well approx- 
imated by a linear combination of a small set of vectors 
from 'J, i.e. there exists a set of indices {mi, ...,mx} C 
{1, ..., M}, for small K < M, such that: 
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Therefore: 



® T z = x = [xi,...,X M ] T G 



,M 



(5) 
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is a sparse signal, representing the acquired signal z G M. M 
in the basis ^. Reciprocally, by knowing x one can easily 
synthesize z as following: 



\l>x. 



(7) 



Now, let us consider a set of vectors: 

»A/ 



3> = <jV"V (n) € 
and the corresponding matrix: 



n=l,...,JV , 



(8) 
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such that N < M. We use these vectors to collect N 
measurements of the sparse signal x: 



9 1 x = y = \yi,...,y N y er, 



(10) 



such that: 

y n = (<P {n \x) = J2vin ) Xm, n = l,...,N. (11) 
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The CS theory shows that it is possible to construct a 
set of vectors <&, such that the measurements y preserve 
the essential information about the sparse signal x, and 
therefore the sparse signal x can be recovered from the 
measurements y. More specifically, the CS theory shows 
that the matrices 'fr and 4> must satisfy the restricted 
isometry condition [T]- [TJ: 



(l-S z )\\z\L< 
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where ||-|| 2 denotes the £2 (Euclidean) norm. The re- 
stricted isometry constant 5 Z is defined as the smallest 
constant for which this property holds for all A"-sparse 



vectors z £ 



pM 



in the basis W. The matrices $ and 



M* must also satisfy the incoherence condition, which re- 
quires that the rows of $ cannot sparsely represent the 
columns of ^ (and vice versa) Q]-[7]- It has been shown 
that one can generate such a measurement matrix $ 
with high probability, if the elements of the matrix are 
drawn independently from certain probability distribu- 
tions, such as the Gaussian distribution or the Bernoulli 
distribution [T]-[7]. This is a consequence of the fact that 
in high dimensions the probability mass of certain ran- 
dom variables concentrates strongly around their expec- 
tation. Also, recent theoretic considerations have shown 
that in order to achieve the restricted isometry condition, 
any M x N matrix <& must have at least N ~ cK < M 
columns, c = c(M, K) ~ \og{M/K) > 1, in order for the 
observation y £ K to allow an accurate reconstruction 
of a: ffl-ITJ. ' 

Searching for the sparsest x that matches y, subject to the 
measurements $, leads to the £q optimization problem: 



x — arg mm 

x£M M 
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s.t. $ r a 



V- 



Here, 



A/ 



Nl = £ t 1 -<*(*«" °)1. 



(13) 
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is the £q norm, measuring the number of nonzero coeffi- 
cients in the vector x, and 



6(a,b) = 



1 if a = b 
*/ a ^ b 



(15) 



is Dirac's function. Unfortunately, this problem is known 
to be an NP-hard combinatorial optimization problem, 
requiring the enumeration of all possible collections of 
columns in the matrix $ T and searching for the smallest 
collection which best approximates the signal y PQ-[7J- 
The standard approach to the sparse recovery problem 
is based on the convexification of the objective function, 
obtained by replacing the £q norm with the £\ norm: 



Nix 
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(16) 



The resulting optimization problem: 

x — arg min llxlk , s.t. $ x = y. (17) 

x£R M 

is known as BP, and it can be solved using linear pro- 
gramming techniques whose computational complexities 
are polynomial pQ-[7j. However, in most real applications 
the BP approach requires the solution of a very large con- 
vex, non-quadratic optimization problem, and therefore 
suffers from high computational complexity PQ-[7J. 

A summarizing diagram of the CS framework is given in 
Figure 1 . During the encoding process the signal z £ M. M 
is first transformed into a sparse signal x £ M. M , using 
the M x M orthogonal transform ^> T (a typical situa- 
tion corresponds to Fourier and wavelet representations 
of natural images). In the next step the sparse signal 
x £ R M is compressed into y £ M. N , using the N x M 
random projection matrix $ T , with N < M . The decod- 
ing process requires only the compressed vector y £ M. N 
and the matrices $ and *&, and consists also in two steps. 
In the first step, the sparse signal x £ M. M is recovered by 
solving the £\ minimization problem. In the second step 
the original signal z £ M. M is synthesized from x, using 
the orthogonal transform ^>. 

3 Stochastic Optimization Approach 

Our SO approach is based on the observation formulated 
in the following theorem: 

Theorem: The solution x of the BP problem has the 
following form: 

x = x' + £, (18) 

where x' is a solution of the underdetermined linear sys- 
tem: 

(19) 



$ T x' = y, 



and £ is an unknown vector from the null subspace of the 
measurement matrix $ T . 
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*F T z 



$ T x 



Decoding 



x=argmin||x|| 

s.t. 

(J> T X=y 
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Figure 1: Compressive sensing framework. 



Proof: Let us assume that £ is a vector from the null 
subspace of the measurement matrix <3E> T , i.e. it can be 
written as a linear combination: 



£ 



Af 

£ a™g (m) , 

771=1 



(20) 



where a m G R, and g' 1 ™* 1 are the columns of the null 
subspace projection operator: 



Q=[q { 



7 (™)l 
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(21) 



Obviously we have: 

$ T Q = ^ $V m) = 0, m = 1, .., M, (22) 

and consequently: 

$ T a; = $ T (x' + C) = * T ^' = V- (23) 

Thus, a;' must be a solution of the underdetermined linear 
system: <t T x' = y. 

Reciprocally, let us assume that x' is a solution of the 
underdetermined linear system: $> T x' — y. Thus, since 
x — x' + £ is the solution of the BP problem, then it also 
satisfies the constraint & T x = y, which implies $ T £ = 0, 
and therefore £ must be a vector from the null subspace 
of $ T ■ 

A good initial solution (with minimum £2 norm) of the 
underdetermined linear system $> T x' = y, is given by: 



x' = $'y, 



where 



$t = $ (§ T §\ 



(24) 



(25) 



is the Moore- Penrose pseudo-inverse [5], since obviously 
we have: 

q> T x' = $ T $t y = $t $ {§Tq\ y = j Ny = y . (26) 

Unfortunately, it is hard to find the unique vector £ = 
x — x' from the null subspace, which corresponds to the 
unique BP solution x. This is where we use the stochastic 
search approach, in order to avoid the difficult to imple- 
ment deterministic optimization methods. 



Our approach is inspired by the Threshold Accepting 
(TA) method [10]. TA method is a deterministic ana- 
log to the well known Simulated Annealing (SA) method 
[TTj . It is a refined search procedure which escapes local 
minima by accepting solutions which are not worse by 
more than a given threshold. The algorithm is determin- 
istic in the sense that we fix a number of iterations and 
explore the neighborhood with a fixed number of ran- 
dom steps during each iteration. Analogously to the SA 
method, the threshold is decreased successively and ap- 
proaches zero in the last iteration. The main difference 
between SA and TA is that for SA the threshold is mod- 
eled as a random variable, while for TA the threshold is 
deterministic. The TA algorithm has the advantage of an 
easy parametrization, it is robust to changes in problem 
characteristics and works well for many hard problems. 

In our case, the initial solution x 4— x' is randomly 
perturbed in the null subspace of the matrix $ T , by 
adding a small random contribution of the column q( m ' , 
m = 1, ..., M , of the matrix Q: 



aaq 



(m) 



(27) 



Here a > gives the magnitude of the perturbation, and 
a is a uniform binary random variable, which provides 
the sign of the perturbation, a G {±1}. Obviously, at 
every random transition step, the new candidate solution 
x still satisfies the linear constraint: 



$ J z = $ J 



(* 



aaq 



(m) 



® T x' 



y, 



(28) 



since $ T g( m ) = 0. Thus, in the framework of l\ norm 
minimization, the new random candidate solution x is 
accepted (x <r- x) if: 



|5?lli-Nli< 



(29) 



and the process continues with a new perturbation. 



We assume that the threshold parameter 9 > and the 
magnitude of the perturbations a > 0, follow an expo- 
nential schedule, decreasing by a fixed factor A G (0, 1): 



4— X6, a <— Act, 



(30) 



at each iteration. If 6i and Of are the initial and, re- 
spectively the final values of the threshold, then we set 
A = (0f/0 i ) 1 / T , where T is the total number of iterations. 



While this approach provides a relatively simple SO 
method for l\ norm minimization, it still can be improved 
by accelerating its convergence. This can be done by re- 
placing the standard tt\ norm with a weighted l\ norm, 
which penalizes the small components of the candidate 
solution. The weighted l\ norm can be derived using the 
concept of spectral entropy, as shown below. 

We assume that x — [xi, ...,xm] T is the spectral decom- 
position of some discrete signal z G R M in the basis ^P: 



M 



Vx=J2 x ^ {m) € 



.Af 



(31) 



Thus, \x m \ represents the spectral amplitude of the com- 
ponent ip( m \ in the spectral decomposition of z G R M . 

We define the spectral signature of z G R M in the basis 
"J as following: 



p(x,i&) = \pi,...,Pm] 



(32) 



where 



Obviously, we have H(x,^>) G [0,1]. A high value 
H(x,^) ~ 1 means that the source x G (K M ,*,F) is 
just noise, i.e. all the components -0 have equiproba- 
ble amplitudes, while a smaller value < H(x,^) <C 1, 
means that the source has some intrinsic structure (or 
order), i.e. some components have a stronger amplitude 
than others. Equivalently, if x £ R M is a sparse signal, 
then its entropy will be low, while if x is not sparse then 
its entropy will be high. 

Now, let us consider the following functional, defined 
as the product between the spectral entropy and the l\ 
norm: 

F(x,*) = \\x\\ l H(x,V). (38) 

This functional corresponds to the weighted l\ norm of 
the vector x G M M : 



M 



M w i = F(x,^)= J2 w ™\ x ™\ 



(39) 



771 — 1 



where the weights are given by the self-information of 
each component: 



2^i=l \ X i 



IMIi 



, m = 1, ...,M. 



(33) 



w m = h(x m ,ip {m) ). 



(40) 



We also define a probability measure P for x by: 

P(x m ,^)=p m , (34) 

which means that p m is the probability that the spec- 
tral decomposition of z G M M in the basis "J/, has the 
component ip^ m ' with an amplitude \x m \. Obviously, we 
have: 

M 

^2 Pm. = 1, (35) 

m— 1 

and p is a probability distribution, associated with x. 

Therefore, x can be modeled as a random variable in the 
probability space (IR M ,^/,P). Also, x can be viewed as 
an information source, with its statistics defined by p. 

Since x G (R , ^, P) may be viewed as an information 
source, we can further define its self-information provided 
by the component ip^" 1 ^ as following [121113]: 

h{x m ,ip {m) ) = -\og M P{x m ,^ m) ) > 0. (36) 



By replacing the standard i\ norm, \\x\\i, with its 
weighted version, HxH^, one penalizes the small com- 
ponents, x m , of the candidate solution, since: 



w m = hm log M - 

|s m |->0 \x m \ 



CO. 



(41) 



Thus, in the stochastic iterative process, the solution will 
concentrate around the small weights, which correspond 
to the non-zero components of the sparse signal. 

In order to avoid the singularity in the weight estimation, 
we introduce a small parameter < e <§C 1/M, such that: 



IV., 



Mil + -Me n 

logM ■ | , > 0. 



(42) 



Therefore, this analysis shows that from the point of view 
of solving the sparse signal recovery problem we need 
to find the source x G (R M ,^,P) which minimizes the 
weighted l\ norm, subject to a linear constraint system, 
imposed by the random measurements: 



The average self-information, over all components of the 
basis Vt, defines the spectral entropy of the source x G 
(R M ,*,P) [12J [13]: 



A I 



H(x, tf) = J2 p (^' 4> {m) )Hx m , V M ). (37) 



x — arg min ||a;|| M;1 , s.t. $ x — y. (43) 

Using the same threshold accepting approach as before, 
a new candidate solution x is accepted (x <— x) if: 



X \\wl-\\ X \ 



W 1 



< 



(44) 



In the case oip m = 0, we consider p m log M p m = 0, which 
is consistent with the limit: lim p ^ + P^°&mP = 0- 



Thus, the pseudo-code of the proposed SO algorithm can 
formulated as following: 



# SO sparse signal recovery algorithm: 

# Initialize the parameters 
6i, Of , a i: T; 

# Compute A 

a = (0/M) 1/t ; 

# Initialize the solution 
x <r- &y; 

# Compute the null subspace projection operator 

Q ± <- I M - $ ($ T $) ' $ T ; 

# Set the initial parameters 
9 <— Oi] a ■£- af, 

FOR(i = 1,...,T){ 
FOR(m = l,...,M){ 
^Compute a candidate solution 
(J «— sign(rnd(l) — 0.5); 
a^<— a; + crag^" 1 ); 

# Test the solution 

IF(F -F <9) THEN{ir <- z; F f- F;}} 

# Compute the new parameters 
6* «- A#; a <- Aa;} 

# Return the approximation of x 
RETURN x; 



4 Numerical Results 

We have implemented the above SO algorithm in C on 
a Linux PC, using the GNU Scientific Library [TJ]. In 
Figure 2 we give several examples of sparse signal recov- 
ery using the SO method. The non-zero coefficients of 
the sparse signal x £ R M are uniformly random gener- 
ated in the interval [—1,1]. Also, the elements of the 
measurement matrix <fm are drawn from the Gaussian 
distribution r(0, 1). In these particular examples, the 
length of the sparse signal x £ M M and the number of 
measurements were set to M — 100, and respectively 
N = 50. The initial and final thresholds were set to 
9i = 0.5, and respectively Of = 10~ 5 . Also we used 
ai = 1, and A = 0.95, such that the number of itera- 
tions are T = ]n(6 f / 9$ / \n(X) = 300. The sparsity of 
the signal was varied as following: K = 20, 25, 30. One 
can see that by increasing K , and keeping the number of 
measurements constant, the recovery error 



K=20, red=original, blue=recovered 



E = 100 x ||x 



recovered 



•£ original \\ / H^o 



dl(%) (45) 



deteriorates: E = 8.532 • 10~ 4 % for K = 20; E = 3.145% 
for K = 25; E = 11.329% for K = 30. This result is 
expected, since the fixed number of measurements cannot 
hold enough information about the increasing number of 
non-zero elements. Also, this result suggests that there is 
a phase transition in the error function, depending on the 
ratio between the sparsity parameter K and the number 
of measurements N. 




K=25, red=original, blue=recovered 




K=30, red=original, blue=recovered 




Figure 2: Examples of sparse signal recovery. 



The phase transition is illustrated in Figure 3, where 
we give the relative recovery error E = E(K,N), as a 
function of the sparsity K and the number of measure- 
ments N, obtained by averaging over 100 instances for 
each value of K and N. One can see that there is a 
large area (the dark color) where the method performs 
very well, and recovery is done with an error of less than 
10%. Also, the contour lines, corresponding to a fixed 
error E, have a logarithmic dependence, similar to the 
ones obtained with the BP approach. 

It is interesting to note that the measured vector y is 
used only once in the whole recovery process, to find the 
admissible initial solution, which satisfies the linear con- 
straint system. After this initial step, the algorithm sim- 
ply perturbs the solution such that the new candidate so- 
lution always satisfies the linear constraint system. This 
approach reduces drastically the search space from R M 
to M. R , where R — M — N is the rank of the null sub- 
space operator Q. As expected, by increasing the number 
of measurements N, the dimensionality R of the search 
space is reduced and the method will perform better. 
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Figure 3: The phase transition in the average recovery 
error, E = E(K,N). 



5 Conclusion 



We have presented a SO approach to the problem of 
sparse signal recovery encountered in the CS framework. 
The considered SO method has the advantage of a very 
easy implementation, comparing to the highly complex 
BP standard approach, used in CS framework. Thus, the 
proposed method is well suited for applications in which 
a simple implementation, with an approximate recovery 
performance of the original sparse signals, is acceptable. 
The objective function of the SO method is defined as 
the product between the l\ norm and the spectral en- 
tropy of the candidate solution, and it is equivalent with 
a weighted £\ norm functional, where the weights corre- 
spond to the self-information of each component of the 
signal. As a consequence of using such an objective func- 
tion, the convergence of the SO minimization method is 
improved, since it focuses on entries where the weights 
are small, which by definition correspond to the non-zero 
components of the sparse signal. 
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